On using spoken data in corpus lexicography

نویسنده

  • Rosamund MOON
چکیده

Corpora are increasingly used in lexicography in order to provide good evidence for dictionary statements: the inclusion of spoken data in corpora is generally considered important. This paper raises some issues connected with the use of spoken data. It points out that the extensive differences between written and spoken language have great consequences for dictionary-making. It argues that the repercussions have not yet been fully thought through, and suggests that new models for the lexicographical description of spoken language may have to be developed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Corpus-Driven Study of the Variation of Co-Occurrence Patterns in Written and Spoken Registers

This paper will focus on the study of the variation of co-occurrence patterns encountered in written and spoken registers, through the analysis of a large lexical database of corpus-extracted multiword expressions (MWEs) of European Portuguese. Those MWEs were automatically extracted from a balanced 50 million word written corpus and a 1 million word spoken corpus, furthermore statistically int...

متن کامل

Language Technology for Normalisation of Less-Resourced Languages

This paper describes the stages involved in implementing a corpus of spoken Irish. This pilot project (consisting of approximately 140K words of transcribed data) implements part of the design of a larger corpus of spoken Irish which it is hoped will contain approximately 2 million words when complete. It hoped that such a corpus will provide material for linguistic research, lexicography, the ...

متن کامل

Elsnet 97 Summer School Lexicon Development for Language and Speech Processing Draft Course Materials Computational Lexicography for Speech and Language 1.2 Lexical Resources 1.4 Lexical Databases and System Lexica

Contents 1 Aspects of lexicography 1 1. Bibliographical references 97 1 Aspects of lexicography This collection of course materials borrows from various materials, partly published and partly unpublished course materials. It is heterogenous, uneven, and currently in very rough shape, but contains a variety of diierent kinds of material relevant to spoken language lexicography. The collection st...

متن کامل

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008